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Abstract 

C/3 , We study the problem of constructing a deterministic polynomial time algorithm that achieves 

O . 

omniscience, in a rate-optimal manner, among a set of users that are interested in a common file but 

each has only partial knowledge about it as side-information. Assuming that the collective information 

\^ ' among all the users is sufficient to allow the reconstruction of the entire file, the goal is to minimize the 

I (possibly weighted) amount of bits that these users need to exchange over a noiseless public channel in 

. order for all of them to learn the entire file. Using established connections to the multi-terminal secrecy 

00 ■ 

' problem, our algorithm also implies a polynomial-time method for constructing a maximum size secret 

shared key in the presence of an eavesdropper. 

We consider the following types of side-information settings: (i) side information in the form of 

•rH , 

. uncoded fragments/packets of the file, where the users' side-information consists of subsets of the file; 

' (ii) side information in the form of linearly correlated packets, where the users have access to linear 

combinations of the file packets; and (iii) the general setting where the the users' side-information has 
an arbitrary (i.i.d.) correlation structure. Building on results from combinatorial optimization, we provide 
a polynomial-time algorithm (in the number of users) that, first finds the optimal rate allocations among 
these users, then determines an explicit transmission scheme (i.e., a description of which user should 
transmit what information) for cases (i) and (ii). 
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Fig. 1. An example of the data exchange problem. A base station has a file formed of four packets wi, ... ,104 G Fq and wants 
to deliver it to three users over an unreliable wireless channel. The base station stops transmitting once the users collectively 
have all the packets, but may individually have only subsets of the packets. For instance, here the base station stops after user 
1, user 2 and user 3 have respectively packets {1102,1^3, W4,},{wi,W3}, and {wi,W2,W4}, which can now be regarded as side 
information. The users can then cooperate among themselves to recover their missing packets. Here, the 3 users can reconcile 
their file with the following optimal scheme that minimizes the total amount of communicated bits: user 1 transmits packet W4, 
user 2 transmits wi + wz, and user 3 transmits W2, where the addition is in the field F,. 



I. Introduction 

In the recent years cellular systems have witnessed significant improvements in terms of data rates and 
are nearly approaching theoretical limits in terms of the physical layer spectral efficiency. At the same 
time the rapid growth in the popularity of data-enabled mobile devices, such as smart phones and tablets, 
far beyond the early adoption stage, and correspondingly the increasing demand for more throughput are 
challenging our ability to meet this demand even with the current highly efficient cellular systems. One 
of the major bottlenecks in scaling the throughput with the increasing number of mobile devices is the 
last mile wireless link between the base station and the mobile devices - a resource that is shared among 
many users served within the cell. This motivates investigating new ways where cell phone devices can 
possibly cooperate among themselves to get the desired data in a peer-to-peer fashion without solely 
relying on the base station. 

An example of such a setting is shown in Figure [T] where a base station wants to deliver the same file 
to multiple geographically-close users over an unreliable wireless downlink. Such scenario may occur 
for instance when co-workers are using their tablets to share and update files stored in the cloud (e.g., 
Dropbox), or when users, in the subway or a mall, are interested in watching the same popular video. 
For our example, let us suppose that the file consists of four equally sized packets wi, W2, and 
belonging to some finite field F^. Also, suppose that after few initial transmission attempts by the base 
station, the three users individually receive only parts of the file (see Figure [T]), but collectively have the 
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entire file. Now, if the mobile users are in close vicinity and can communicate with each other, then, it 
is much more desirable and efficient, in terms of resource usage, to reconcile the file among users by 
letting them "talk" to each other without involving the base station. This cooperation has the following 
advantages: 

• The connection to the base station is either unavailable after the initial phase of transmission, or it 
is too weak to meet the delay requirement. 

• Transmissions within the close group of users is much more reliable than from any user to the base 
station due to geographical proximity. 

• Local communication among users has a smaller footprint in terms of interference, thus allowing 
one to use the shared resources (code, time or frequency) freely without penalizing the base station's 
resources, i.e., higher resource reuse factor. 

The problem of reconciling a file among multiple wireless users having parts of it while minimizing 
the cost in terms of the total number of bits exchanged is known in the literature as the data exchange 
problem and was introduced by El Rouayheb et al. in |[T1. In terms of the example considered here, if 
the 3 users transmit Ri , R2 and bits to reconcile the entire file, the data exchange problem would 
correspond to minimizing the sum-rate i?i + i?2 + ^3 such that, when the communication is over, all the 
users can recover the entire file. It can be shown here that the minimum sum-rate required to reconcile 
the file is equal to 3 and can be achieved by the following coding scheme: user 1 transmits packet w^, 
user 2 transmits wi + W'j,, and user 3 transmits W2, where the addition is over the underlying field Fg. 
This corresponds to the optimal rate allocation Ri = R2 = R'j, = 1 symbol in Fg. 

In a subsequent work, Sprinston et al. |2| proposed a randomized algorithm that with high probability 
achieves the minimum number of transmissions, given that the field size F^ is large enough. Courtade et 
al. IS and Tajbakhsh et al. formulated this problem as a linear program (LP) and showed that the 
proposed LP under some additional assumption', can be solved in polynomial time. In a more general 
setting, one can consider minimizing a different cost function, a "weighted sum rate", i.e., minimizing 
aii?i + 02^2 + "3-R3> for some non-negative weights < Oj < 00, i = 1,2,3, to accommodate the 
scenario when transmissions from different users have different costs. This problem was studied by Ozgul 
et al. fJl, where the authors proposed a randomized algorithm that achieves this goal with high probability 
provided that the underlying field size is large enough. 

The results above consider only the simple form of the side-information where different users observe 

'if users are allowed to split the packets into arbitrary number of smaller chunks. 
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partial uncoded "raw" packets/fragments of the original file. Typically, content distribution networks 
use coding, such as Fountain codes or linear network codes, to improve the system efficiency. In such 
scenarios, the side-information representing the partial knowledge gained by the users would be coded 
and in the form of linear combinations of the original file packets, rather than the raw packets themselves. 
The previous two cases of side information ("raw" and coded) can be regarded as special cases of the 
more general problem where the side-information has arbitrary correlation among the observed data of 
different users and where the goal is to minimize the weighted total communication (or exchange) cost 
to achieve omniscience. In @ Csiszar and Narayan pose a related security problem referred to as the 
"multi-terminal key agreement" problem. They show that achieving omniscience in minimum number of 
bits exchanged over the public channel is sufficient to maximize the size of the shared secret key. This 
result establishes the connection between the Multi-party key agreement and the Data exchange problems. 
The authors in |6| solve the key agreement problem by formulating it as a linear program (LP) with an 
exponential number of rate-constraints, corresponding to all possible cut-sets that need to be satisfied, 
which has exponential complexity. 

In this paper, we make the following contributions. First, we provide a deterministic polynomial time 
algorithm^ for finding an optimal rate allocation, w.rt. a linear weighted sum-rate cost, that achieves 
omniscience among users with arbitrarily correlated side information. For the data exchange problem, 
this algorithm computes the optimal rate allocation in polynomial time for the case of linearly coded 
side information (including the "raw" packets case) and for the general linear cost functions (including 
the sum-rate case). Moreover, for the "multi-terminal key agreement" security problem of |6], this 
algorithm computes the secret key capacity (maximum key length) in polynomial time. Second, for the 
the data exchange problem, with raw or linearly coded side-information, we provide efficient methods 
for constructing linear network codes that can achieve omniscience among the users at the optimal rates 
with finite block lengths and zero-error. 

The rest of the paper is organized as follows. In Section |IT1 we describe the model and formulate the 
communication problem. Section |lll] provides the necessary mathematical background in combinatorial 
optimization that will be needed for constructing our algorithm. In Section lTVl we describe the polynomial 
time algorithm which finds an optimal rate allocation that minimizes the sum-rate (non-weighted case). 

^The complexity of our proposed algorithm is 0{m'^ ■ SFM{m)), where m is the number of users and SFM[m) is the 
complexity of submodular function minimization. To the best of our knowledge, the fastest algorithm for SFM is given by Orlin 
in (7), and has complexity 0(rn^ ■ 7 + m®), where 7 is complexity of computing the submodular function. 
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In Section |Vl we use the results of Section |IV] as a key building block to construct an efficient algorithm 
for an arbitrary linear communication cost function. In Section |Vll we propose a polynomial time code 
construction for the data exchange problem using results in network coding. We conclude our work in 
Section |VlIl 

II. System Model and Preliminaries 

In this paper, we consider a set up with m user terminals that are interested in achieving omniscience 
of a particular file or a random process. Let Xi, X2, . . . , Xm, m >2, denote the components of a discrete 
memoryless multiple source (DMMS) with a given joint probability mass function. Each user terminal 
i G Ai = {1,2, ... ,m} observes n i.i.d. realizations of the corresponding random variable Xj. The 
final goal is for each terminal in the system to gain access to all other terminals' observations, i.e., to 
become omniscient about the file or DMMS. In order to achieve this goal the terminals are allowed to 
communicate over a noiseless public broadcast channel in multiple rounds and thus, may use interactive 
communication, meaning that transmission by a user terminal at any particular time can be a function 
of its initial observations as well as the past communication so far over the public broadcast channel. In 
||6l, Csiszar and Narayan showed that to achieve the omniscience in a multi-terminal setup with general 
DMMS interactive communication is not needed. As a result, in the sequel WLOG we can assume that 
the transmission of each terminal is only a function of its own initial observations. Let Fi := fi{Xf) 
represent the transmission of the terminal i ^ A4., where is any desired mapping of the observations 
X". For each terminal to achieve omniscience, transmissions Fi, i ^ M., should satisfy, 

lim -H{X1,\F,X'l) =0, Vi G M, (1) 

rn-oo n 

where Xm = (Xi, X2, . . . , Xm). 

Definition 1. A rate tuple R = {Ri, R2, . . . , Rm) is an achievable communication for omniscience (CO) 
rate tuple if there exists a communication scheme with transmitted messages F = {Fi, F2, . . . , Fm) that 
satisfies (O, i.e., achieves omniscience, and is such that 

Ri = lim -H{Fi), yieM. (2) 

n— >-oo n 

In the omniscience problem every terminal is a potential transmitter as well as a receiver. As a result, 
any set S C Ai,S ^ M, defines a cut corresponding to the partition between two sets S and S'^ = M\S. 
It is easy to show using cut-set bounds that all the achievable CO rate tuple's necessarily belong to the 
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following region 

n = {'R:R{S)>H{Xs\Xs.),ScM}, (3) 

where R{S) = X]ie5 Also, using a random coding argument, it can be shown that the rate region 
TZ is an achievable rate region iSl. In 111 and |j9l the authors provide explicit structured codes based on 
syndrome decoding that achieve the rate region for a Slepian-Wolf distributed source coding problem. 
This approach was further extended in |10| to a multi terminal setting. 

In this work, we aim to design a polynomial complexity algorithm that achieves omniscience among 
all the users while simultaneously minimizing an appropriately defined cost function over the rates. In the 
sequel we focus on the linear cost functions of the rates as an objective of the optimization problem. To 
that end, let a = (oi, • • • , Om), < a < oo, be an m— dimensional vector of non-negative finite weights. 
We allow aj's to be arbitrary non-negative constants, to account for the case when communication of 
some group of terminals is more expensive compared to the others, e.g., setting ai to be a large value 
compared to the other weights minimizes the rate allocated to the terminal 1. This goal can be formulated 
as the following linear program which hereafter we denote by LPi (a ) : 

m 

min^ UiRi, s.t. R G 7^, (4) 

1=1 

We use TZ{a) to denote the rate region of all minimizers of the above LP, and Rco{(l) to denote the 
minimal cost. 

Data Exchange Problem with linear correlation among users observations 

As mentioned in Section U efficient content distribution networks use coding such as fountain codes or 
linear network codes. This results in users' observations to be in the form of linear combinations of the 
original packets forming the file, rather than the raw packets themselves as is the case in conventional 
'Data Exchange problem'. This linear correlation source model is known in literature as Finite linear 
source ifTTI . 

Next, we briefly describe the finite linear source model. Let q be some power of a prime. Consider the 
A^-dimensional random vector W € whose components are independent and uniformly distributed 
over the elements of Fq» . Then, in the linear source model, the observation of i^^ user is simply given 
by 

Xi = AiW, i G (5) 
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where Aj G ]F^-x^ is an observation matrix^ for the user i. 
It is easy to verify that for the finite linear source model, 

f^=rank(A,). (6) 
log 

Henceforth for the finite linear source model we will use the entropy of the observations and the rank 
of the observation matrix interchangeably. 

For the sake of brevity we use the following notation 

A 



rank 




rank(A,B), (7) 



rank(A|B) = rank(A, B) - rank(B). (8) 



Similar to the general DMMS model, for the finite linear source model an omniscience achievable rate 
tuple necessarily belongs to 

Tide = {R : R{S) > rank(A5|A50, S C M} , (9) 

where R{S) = X^ies^i' ^^'^ is a matrix obtained by stacking Aj,Vi £ S. The rate Ri, i £ M. is 
the number of symbols in F^^ user i transmits over the noiseless broadcast channel. 

III. Optimization over polyhedrons and Edmond's algorithm 

In this section we review results and techniques from the theory of combinatorial optimization. These 
results will form a key ingredient in finding a polynomial time algorithm for solving the rate minimization 
problem LPi(a) which will be described in Sections ITVl and IVl The idea is to recast the underlying rate 
region TZ, defined by the cut-set constraints in ([3]), as a polyhedron of some set function whose dual 
is intersecting submodular which can be optimized in polynomial time. Then, we identify conditions 
under which the optimization problem over the dual polyhedron and the original problem have the same 
optimal solution. 

Here, we state the definitions, theorems and algorithms that will be needed in the next sections. For 
a comprehensive exposition of combinatorial optimization, we refer the interested reader to references 

m, US. 

^The entries in the observation matrix Ai,\/i £ Ai denote the coefficients of the code, e.g., Fountain code or linear network 
code, used by the base station and hence belong to the smaller field rather than the field F^n to which the data packets belong. 
This assumption is justified since the coding coefficients are typically stored in the packet in an overhead of size negligible 
compared to the packet length. 
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Definition 2 (Polyhedron). Let / be a real function defined over the set M = {1,2, ... ,m}, i.e., 
f : 2^ M such that /(0) = 0, where 2-'^ is the power set of A4. Let us define the polyhedron 
P{f, <) and the base polyhedron B{f, <) of / as follows. 

P{f, <) = {Z I Z G M™, ySCM: Z{S) < f{S)}, (10) 
B(/,<)^{Z I ZeP(/,<), Z{M) = f{M)}, (11) 

where Z{S) = Y,i(zs ^i- 

Example 1. Consider the function / defined over set M = {1,2} such that /(0) = 0, /({I}) = 4, 
/({2}) = 3, and /({I, 2}) = 6. The polyhedron P{f) is defined by the region Zi < 4, Z2 < 3, and 
Zi + Z2 < 6 (see Figure |2])- For the base polyhedron there is the additional constraint Zi + Z2 = 6. 

Z2 




Fig. 2. Polyhedron P{f, <) and the base polyhedron B{f, <) for the function / specified in Example [T] 



Notice that the base polyhedron B{f,<) can be an empty set of vectors in general. For instance, if 
function / in Example [U is such that /({I, 2}) = 8 instead of 6. 

Definition 3 (Dual function). For a set function / let us define its dual function f* : 2^ — )• M as follows 

n5^) = /(X) - ySCM, (12) 

where S^ = M\S. 
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With the dual function /*, we associate its polyhedron and base polyhedron as follows 

p{f\ >) 4 {R I R G M™, ySCM: R{S) > t{S)}, (13) 

i?(r,>)^{R I Rep(r,>), RiM) = nM)}, (U) 

Lemma 1. If B{f, <) / then, B{f, <) = B{f\ >) and {f*y = f. 

Proof of Lemma [U is provided in Appendix El For the set function / from Example [B the polyhedron 
P{f*, >) and the base polyhedron B{f*, >) are presented in Figure |3] We say that two optimization 




Fig. 3. Equivalence between B{f, <) and B{f* , >) illustrated for the function / provided in Example[T] 

problems are equivalent if they have the same optimal value and the same set of optimizers. 
Lemma 2. If B{f) ^ 0, then the following optimization problems are equivalent 

m 

max^Zj, s.t. ZeP(/,<). (15) 

i=l 

m 

min^Pi, s.t. ReP(r,>). (16) 
^ 1=1 

Lemma |2] can be easily proved from the following argument provided in |13|. Since B{f,<) ^ 0, 
there exits a vector Z such that Z{M) = f{M) = f*{M). Moreover, Z € P(/, <) = B{f*, >). Hence, 
Z is a maximizer of the problem ([TSl l and a minimizer of the problem ([T6l ). 

Next, we define the class of submodular functions for which the maximization problem ([TSl ) has 
analytical solution. 



10 



Definition 4 (Submodularity). A set function / defined on the power set of A^, / : 2^ — )• M, where 
/(0) = 0, is called submodular if 

/(5) + /(r) >/(5ur) + /(5nr), v5,rcM. (i?) 

Remark 1. When / is submodular, then B{f, <) / 0. 
For a more general version of the problem ( fTSl ) 

max^OiZi, s.t. Z€P(/,<), (18) 

i=l 

where > 0, for i = 1, . . . , m, and / is submodular, an analytical solution can be obtained using 
Edmond's algorithm. 

Theorem 1 (Edmond's greedy algorithm lfT4l '). When f is submodular, the maximization problem (fTSl) 
given by maxz Xll^i -^j' ^ ^ -f (/)> solved analytically as follows. 

= f{Ai) - /(A-i), i = l,...,m, 
where j(2), . . . , j(m) w a« ordering of {1, 2, ... , m} 5mc/i f/iaf > Q;j(2) ^ • • • ^ <^j{m)' '^^d 

A = 0, i = 1, 

A = {j(l),i(2),...,i(i)}, i = 2,3,...,m. 
The following statement directly follows from Remark [T] 

Remark 2. When / is submodular, a maximizer Z of the optimization problem ( fTSl ) satisfies Xli^i = 
/(-^)- 

Example 2. In this example we illustrate Edmond's greedy algorithm by considering the set function / 
from Example [T] and the optimization problem 

max5Zi + Z2, s.t. Z G <), (19) 

where Z = (Zi, Z2). Since ai = 5 > 02 = 1> we set 1, 2 to be the ordering of {1, 2}, i.e., j(l) = 1 and 
j{2) = 2. Then, by applying Edmond's algorithm we obtain Zi = A, Z2 = 2 to be the maximizer of the 
problem ( fT9l) . 
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Z2 = /({!, 2}) -/({I}) 



Fig. 4. Edmond's algorithm applied to the optimization problem il9i . Since ai > a2, the optimal ordering of {1,2} is 1,2. 



Edmond's algorithm is illustrated in Figured for the case Ai = {1,2}. Notice that each iteration of 
the algorithm reaches a boundary of the polyhedron P{f,<) until it finally reaches the vertex of the base 
polyhedron B{f,<). 

In 111 51 . it was shown that the following optimization problem can also be solved using Edmond's 
greedy algorithm. 

Corollary 1. When / is submodular, then the optimization problem 

m 

mm^aiRi, s.t. RgS(/,<), (20) 

^ i=l 

can be solved by using Edmond's algorithm where j(l), j(2), . . . ,j{m) is an ordering of A4 such that 

ai(i) < 0!j(2) < ■ ■ ■ < aj(m)- 

Next, we introduce the class of intersecting submodular functions which is instrumental to solving our 
communication for omniscience problem. 

Definition 5 (Intersecting Submodularity). A function / defined on the power set of A^, / : 2^ — R 
is called an intersecting submodular if 



/(5) + /(r) >/(5ur) + /(5nr), VcS,rs.t. snr^fH. 



(21) 



Notice that every submodular function is also intersecting submodular. However, in general, Edmond's 
algorithm cannot be directly applied to solve the maximization problem dTSl ) over the polyhedron of an 
intersecting submodular function. 
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In |[T3l it is shown that for every intersecting submodular function there exists a submodular function 
such that both functions have the same polyhedron. This is formally stated in the following theorem. 

Theorem 2 (Dilworth truncation). For an intersecting submodular function f : 2^ — R with /(0) = 0, 
there exists a submodular function g : 2^ M such that g{%) = and P{g, <) = P{f, <)■ The function 
g can be expressed as 



The function g is called the Dilworth truncation of f. 

Example 3. Let M = {1,2}, and /({I}) = 4, /({2}) = 3, /({I, 2}) = 8. It is easy to verify that 
the function / is intersecting submodular, but not fully submodular since /({l}) + /({2}) < /({I, 2}). 
Applying Dilworth truncation to the function /, we obtain g, where ^({l}) = 4, g{{2}) = 3, g{{l, 2}) = 
7. Moreover, it can be checked that P{g, <) = P{f, <). 

If the Dilworth truncation g of the intersecting submodular function / is given, the optimization problem 
(fTSl) can be efficiently solved using Edmond's greedy algorithm. However, finding the value of function 
g, even for a single set S C M, involves a minimization over a set of exponential size (see (l22l)). This 
can be overcome using the facts that P{g, <) = P{f, <), and that the maximizer of the problem ([TSl l 
belongs to the base polyhedron B{g,<) by Remark [T] The result is a modified version of Edmond's 
algorithm that can solve the optimization problem in polynomial time. 

Lemma 3 (Modified Edmond's algorithm, |[T3l , |[T6l ). When f is intersecting submodular, the maximiza- 
tion problem (fTSl ) given by maxz Xll^i ^i' ^-f- ^ ^ ^W)' ^^'^ solved as follows. 

Algorithm 1 Modified Edmond's Algorithm 
1: Set j{l),j{2), . . . ,j{m) to be an ordering of {1, 2, ... , m} such that 0^(1) > 0^(2) > • • • > o^i(m) 

2: Initialize Z = 0. 

3: for i = 1 to m do 




(22) 



4: = min5{/(cS) - Z{S) : G 5, 5 C A,}. 



5: end for 



The following statement directly follows from Theorem |2] and Remark |2l 
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Remark 3. When / is intersecting submodular, a maximizer Z of the optimization problem ([TSl l satisfies 
SI^i -^i = 5(-^)> where g is the Dilworth truncation of /. 



What this algorithm essentially does is that at each iteration f = 1, 2, . . . , m, it identifies a vector 

Zj(^2) ■ ■ ■ that lies on the boundary of the polyhedron P{f, <). The polynomial 

complexity of the modified Edmond's algorithm is due the fact that the function f{S) — Z{S) is 
submodular since S is not an empty set, and finding the minimum value of a submodular function 
is known to polynomial (see Q). 

Example 4. We illustrate the modified Edmond's algorithm for the function / in Example [3] Let us 
consider maximization problem maxz 5Zi + Z2, s.t. Z G P{f, <)■ As mentioned above, at each iteration 
of the algorithm, the optimal vector should lie on the boundary of the polyhedron P{f, <). Hence, Zi = 4. 
In the second iteration, in order to reach the boundary of P{f, <), Z2 can be either /({I, 2}) — Zi = 4, 
or /({2}) = 3. Since the first choice results in the vector that does not belong to P{f, <), the solution 
is Z2 = 3 (see Figure [5]). 




Fig. 5. Modified Edmond's algorithm applied to the maximization problem over the polyhedron P{f, <), where /(0) = 0, 
/({I}) =4, /({2})=3,/({l,2})=8. 



Theorem 3 (Complexity of the modified Edmond's algorithm |[T6l . |[T3l ). For an intersecting submodular 
function f, the optimization problem (118b can be solved in polynomial time using the modified version 
of Edmond's algorithm described in Lemma\3\ The complexity of this algorithm is 0{m ■ SFM{m)), 
where SFM{m) is the complexity of minimizing submodular function. 
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Remark 4. The submodular function minimization routine can be done in polynomial time. The best 
known algorithm to our knowledge is proposed by Orlin in Q, and has complexity 0{m^ ■ 7 + m^), 
where 7 is complexity of computing the submodular function. 



IV. Communication for Omniscience Rates 

In this section we propose an efficient algorithm for computing a rate tuple which belongs to 7^(q), 
i.e., an optimal rate tuple w.r.t. the optimization problem 

m 



mm 
R 

i=l 



^Ri, s.t. RG7^. (23) 
, henceforth denoted as a = 1. This instance 



We start with the special case when a = 1 1 ... 1 
represents a key building block for solving the problem for general cost vector a. We begin by observing 
that the rate region defined in ^ can be represented as a polyhedron of some set function, say /*, 
to be defined later. In this section we solve LPi(l) by considering the dual set function / of /*, 
and solving the corresponding dual optimization problem. We show that it is possible to construct a 
function /* defining the rate region TZ such that its dual function / is intersecting submodular. Therefore, 
the underlying optimization problem can be solved in polynomial time using the modified Edmond's 
algorithm. Therefore, the optimization problem LPi(l) can be stated as follows 



mm^ Ri, s.t. RGP(r,>), (24) 

i=l 

where P{f*) is a polyhedron of a set function /* such that P{f*,>) = TZ. To that end, we can choose 

= HiXs\Xs^), yScM. (25) 

Notice that the function /* is not completely defined in (l25l) because the value of f*{A4) is missing. 
Therefore, we need to assign f*{A4) such that P{f*, >) =71 and B{f*, >) 7^ 0. The second condition 
ensures equivalence between the optimization problem (l24l) and the corresponding dual problem (see 
Lemma O. It is not hard to see that taking f*{M) = i?co(l) satisfies all the conditions above. Thus, 
we have 

\h{Xs\Xs^) if ScM, 
nS) = I (26) 

LRco(1) if S = M. 
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Of course Rco{^) is not known a priori, but this issue will be addressed later. According to Definition |3l 
the dual set function / of /* has the following form 

Rcoil) - H{XsAXs) if 



(27) 

if 5 = 0. 



Using the duality result in Lemma HI it follows that the optimization problem (l24l) is equivalent to 

m 

s.t. ZGP(/,<). (28) 



max 
z 

1=1 



To avoid cumbersome expressions, hereafter we use P{f) and B{f) to denote P{f, <) and B{f, <), 
respectively. Hence, the optimal value of the optimization problem (1281 ) is i?co(l)- However, the value 
of Rco{^) is not known a priori. To that end, let us replace Rco{^) in (|27] ) with a variable /3, and 
construct a two-argument function f{S,j3) as follows. 



f3-H{Xso\Xs) if 0/5 CM, 

(29) 

if 5 = 0. 



Lemma 4. Function f{S,f3) defined in (1291 ) is intersecting submodular. When /3 > H[X_m), the fimction 
/(5,/3) w submodular. 

Proof of Lemma |4] is provided in Appendix |B] Considering the optimization problem 

m 

maxJ^Z,, s.t. ZGP(/,/3), (30) 

i=l 

as a function of /3, the goal is to identify its characteristics at the point /3 = Rcoi^)- Hereafter, we refer 
to the optimization problem ( [30t as LP2(/3). 

Theorem 4. The optimal value Rcoi^) can be obtained as follows 

Rcoi^) = min/3 such that j3 is the optimal value of LP2{/3). (31) 

Proof: We prove this theorem by contradiction. First, notice that /3 = Rco{^) is a feasible solution 
for the optimization problem (|3TI ). Next, let us assume that for some /3' < Rco{^) there exists a vector 
Z that is a maximizer of the problem LF2W) such that Z{M) = P' = f{M,P'). Since Z e P{f,/3') it 
must satisfy the following set of inequaUties 

Z{S) < ^' - H{XsAXs), y^^SCM. (32) 
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Since /3' = Z{M), and ZiS") = Z{M) - Z{S), we can write ^ as 

Z{S^) > H{Xs.\Xs), y^^S CM. (33) 

Therefore, Z G 7^ is a feasible rate tuple w.r.t. the optimization problem LPi(l) and, hence, it must hold 
that /?' > i?co(l)- This is in contradiction with our previous statement that /?' < Rco{^)- ■ 

Since Rco{^) can be trivially upper bounded by H{X_m) and lower bounded by 0, we can restrict 
the search space in to < /3 < H{Xm)- 

Function f{S,P) is intersecting submodular for the case of interest when < /3 < H{X_m). As noted 
in Theorem |2j for the intersecting submodular function f{S,f3), there exists a submodular function, here 
denoted by Dilworth truncation g{S,(3), such that P{f,(3) = P{g,(3). 

g{S, 13) = mill i ^ /(V, /3):V isa partition of 5 I . (34) 
^ KVeV ) 

Definition 6. Let V{(3) denote an optimal partitioning of the set M according to ( [34l ) for the given /3. 

From Remark[3]it follows that g{M., /3) is the optimal value of the optimization problem LP2(/3) for any 
given /3. Hence, it can be obtained in polynomial time by applying the modified Edmond's algorithm to 
the set function /(5,/3). Moreover, the corresponding optimal partition V{(3) can be efficiently obtained 
by adding two additional steps to the modified Edmond's algorithm as shown in |[T6l and |[T3l (see 
Algorithm [3] in Appendix |D]|. 

From Theorem m it follows that the optimal omniscience rate Rco{^) can be calculated as follows: 

Rcoil)= mill /3, s.t. giM,{3)=^. (35) 

0<I3<H{Xm) 

Notice that g{A4,l3) = f{Ai,/3) = /? whenever the optimal partitioning of the set A4 according to ( [34l ) 
is of cardinality 1, i.e., V{(3) = {{M}}. 

In the further text we show how to solve the optimization problem (|35] ) with at most m calls of the 
modified Edmond's algorithm, which makes the complexity of the entire algorithm polynomial in m. 
From (l34l) it follows that for every /3, the function g{M.,P) can be represented as 

g{M,l3) = \Vmi3- H{Xs^\Xs). (36) 

Therefore, g{A4,l3) is piecewise linear in /3. 

Lemma 5. Function g{Ai,f3) has the following properties 
1) It has at most m linear segments. 
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2) It has non-increasing slope, i.e., g{A4,/3) is a concave function. 

3 ) The last linear segment is of slope 1. 

Moreover, (3 = Rco{^) represents a breakpoint in g{A4, 13) between the linear segment with slope 1 and 
consecutive linear segment with the larger slope. 

The proof of Lemma [5] is provided in Appendix O From (l36l ) it follows that the slope of the function 
g{M,P) is equal to the cardinality of the optimal partition V{(3). Since there are at most m linear 
segments in g{M.,(3), we can solve for the breakpoint of interest according to Lemma [5] in polynomial 
time by performing a binary search. We explain this procedure on a simple case described in Figure 
[6l From Lemma |5] we have that (3 = Rco{^) is a breakpoint of g{M.,f3) between the linear segment 

9{M,f3) 



1 


\ 


\ 


Rco{l) 






/ 01 (32 Pa H{Xm) 



Fig. 6. Optimal KcoiX) be obtained by intersecting linear segments. First, we intersect the line L\ which corresponds 
to /? = 0, with the 45-degree line L2. The intersecting point /3i belongs to the linear segment with slope greater than 1. Then, 
intersecting the segment Lz to which /?i belongs to with the 45-degree line L2, we obtain /?2, and finally /Js after one more 
intersection. Since the linear segment at /Js has slope 1, we conclude that /Ss = i?co(l). 

with slope 1 and consecutive linear segment with the larger slope. Moreover, for every f3 one can obtain 
a value of g{A4,f3) and the corresponding optimal partition w.r.t. (l34l ) in polynomial time using 

Algorithm [3] in Appendix iDl Due to concavity of g{M.,l3), the following algorithm will converge to the 
breakpoint j3 = Rco{^) in at most m iterations. 

Since Rco{^) > 0, we start by, first, intersecting the line Li which belongs to the linear segment 
when /? = and the 45-degree line L2 which corresponds to the last (rightmost) linear segment. Slope of 
the hne Li as well as its value can be obtained in polynomial time by applying Algorithm |3] for /3 = 0. 
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Since the function g{A4,/3) is piecewise linear and concave, the point of intersection (3i must belong to 
the linear segment with slope smaller than |P(0)|, i.e., |7'(/3i)| < 1 7^(0)1. /3i can be obtained by equating 
/3 with Esevio) P - H{XsAXs). Hence, 

= — Wm^^ — ■ ^ ^ 

Next, by applying Algorithm [3] for j3 = Pi, we get (^(X, /3i), P(/3i)). Since |P(/3i)| > 1 (see Figure 
[6l), we have not reached the breakpoint of interest yet, because Rco{^) belongs to the linear segment of 
slope 1. Thus, we proceed by intersecting the line L3 which belongs to the linear segment when /3 = /3i 

with the the 45-degree line L2. Like in the previous case, we obtain = — • Since 

\'P{P2)\ > 1> we need to perform one more intersection to obtain P^, for which |'P(/33)| = 1- Hence, 
/Js = RcoW- For an arbitrary g{M.,fi), the binary search algorithm can be constructed as follows. 

Algorithm 2 Achieving a rate tuple from the region 7^(1) 
1: Initialize /3 = 0. 

2: while \V{I3)\ > 1 do 



3: (3= ^^""'[^'frni^r'^^^ ' where P(/3) is obtained from Algorithm |3l 



4: end while 

5: (3 = Rco{-^) 



It is not hard to see that Algorithm |2] executes at most m iterations, since with each iteration the 
intersection point moves to the right to some other linear segment until it hits Rco{^) (see Figure O. 

Therefore, Algorithm |2] calls Algorithm |3] at most m times. Since the complexity of Algorithm |3] is 
0{'m ■ SFM{m)) (see Appendix iDl). the total complexity of obtaining a rate tuple that belongs to TI{1) 
through Algorithm H is 0{m^ ■ SFM{m)). 

V. Achieving a rate tuple that belongs to 7l{gi} 

In this section we investigate the problem of computing a rate tuple that belongs to TZ{a), where 
< < 00, i = 1, 2, . . . , m. We propose an algorithm of polynomial complexity that is based on the 
results we derived for the 7^(1) case. 

Let us start with restating the optimization problem LPi (a) in the following way. 

m 

minmin^a,i?i s.t. R{M) = (3, R{S) > H{Xs\Xs^), C M (38) 
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where /? > Rcoi^)- Hereafter we denote optimization problem (1381 ) by LP3(a). This interpretation of 

the problem LPi (a) corresponds to finding its optimal value by searching over all achievable sum rates 

R{M). Let us focus on the second term in optimization ( [38] ). 
m 

min J]aii?, s.t. R{M) = R{S) > H{Xs\Xs^), V5 C 7W. (39) 

^ i=l 

Observe that the rate region in ( [39l ) constitutes a base polyhedron B{f*,/3, >), where 

H{Xs\Xs.) ifScM, 



r(5,/3) 



(40) 

P if S = M. 



Since /3 > i?co(l) we have that Bif, /?, >) / 0. From Lemma[I]it follows that Bif*, l3, >) = B{f, p), 
where f{S,f3), defined in (|29l ). is a dual set function of /*(5,/3). Hence, the optimization problem (|39l ) 
is equivalent to 



m 

mm 

i=l 



ainVoji^i s.t. Rg5(/,/3). (41) 
R. — ^ 



In Corollary [T] we implied that for any fixed /3 > Rco{^) the optimization problem (|4TI ) can be solved 
using Edmond's algorithm, with j(l), j(2), . . . ,j{m) being the ordering of M. such that aj(i) < ctj{2) < 
■ ■ • < ctj{m,)- However, since the function f{S,f3) is intersecting submodular, it is necessary to apply the 
modified version of Edmond's algorithm provided in Lemma |3] to obtain an optimal rate tuple w.r.t. (|4T|| . 
Let /i(/3) denote the optimal value of the optimization problem defined in (|4TI) 

m 

h{l3) =zmn^aiRi s.t. Rg5(/,/3). (42) 



i=l 

To that end, we can state problem LP3(a) as 

mm/i(/3), s.t. ^>i?co(l)- (43) 

With every /3 > /^(^^(l) we associate an optimal rate vector R w.r.t. optimization problem (l42l ). Next, 
we show some basic properties of the function h{f3). 

Lemma 6. Function h{(3) defined in (|42l) is continuous and convex when j3 > Rco{^)- 
Proof of Lemma [6] is provided in Appendix IE] 



A. Gradient Descent Method 

From Lemma [6] it immediately follows that we can apply a gradient descent algorithm to minimize the 
function h{j3). However, in order to do that, at every point /3, we need to know the value of h{f3) as well 
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as its derivative. As mentioned above, an optimal rate tuple that corresponds to the function h{f3) can be 
obtained by applying the modified Edmond's algorithm to the problem (|4TI) . From Lemma |3] it follows 
that the optimal rate vector with respect to the optimization problem (|4T]) . has the following form. 

Ri = bi-(3 + Ci, yieM, (44) 

where 6j G Z, and Cj is a constant which corresponds to a summation of some conditional entropy terms. 
Moreover, it follows that the coefficients (6j,Cj), i = 1,2, ... ,m, depend only on the value of /3 (they 
do not depend on the weight vector a). 

Lemma 7. Function h(f3) is piecewise linear in (3. For a fixed f3 > i?co(l) the values of h{f3) and 
"^^^P can be obtained in 0{m ■ SFM{m)) time by applying the modified Edmond's algorithm to the 
ordering of M specified in Corollary\l\ Derivative of h{(3) can be calculated by expressing the optimal 
rates Ri, i € A4, as Ri = hi ■ (3 + Ci in each iteration of the modified Edmond's algorithm. Then, 

dh{(3) 



Y,a,-hi. (45) 



i=i 

To make the gradient descent algorithm more efficient, it is useful to make a search space as tight as 
possible. So far, we showed that the minimizer of the problem LP3(a) belongs to the region [Rcoi'^)-, co). 
Combining the results of Lemma |6] and Lemma |71 we have the following bound. 

Lemma 8. Let /?* be the minimizer of the optimization problem LP-i{a). Then, 

i?co(l) < < H{Xm). (46) 

Proof: Note that the function (3) is submodular when j3 = H{Xm) (see LemmaS]). Optimization 
problem ( [39l ) for /3 = H{X^), can be solved by applying Edmond's algorithm (see Theorem [D to the 
optimization problem (|4TI ). It is easy to verify that the optimal rates have the following form: 

= « G {2,3, . . . ,m}. 

Hence, 

m 

h{f3 = H{Xm)) = + XI ^j(i)^3{i)- 

1=1 

Since > 0, and function h{(3) is convex, it immediately follows that /3* < H{Xj^). ■ 
Since the function /i(/3) is continuous and differentiable, we can find its minimum, and therefore solve 
the optimization problem LP3(a), by applying a gradient descent algorithm. However, in general case. 
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we can only reach the optimal point up to some precession e. In order to be at most e away from the 
optimal solution, the gradient descent method executes approximately 0(log \) iterations ifTTl . Therefore, 
the total complexity of obtaining a rate tuple with a sum rate that is at most e away from the optimal one 
is 0{m? ■ SFM{m) + log ^^-^^^'> ■ m ■ SFM{m)), where the first term corresponds to the complexity 
of finding Rcoi'^)- 

Before we go any further, let us briefly analyze a solution to the optimization problem LPi(a). We 
can think of it as a minimal value C for which the plane C — Y^^i o^i^-i intersects the rate region TZ 
defined in Q. It is not hard to conclude that the point of intersection is one of the "vertices" of the 
region TZ, i.e., it is completely defined by the collection of sets {81,62, ■ ■ ■ ,Sm} such that 

R{Si) = H{Xs^Xs^), I ^{1,2,..., m}. 

The following theorem will be very useful in Section |Vl] when we explore the finite linear source model. 
It represents a key building block for bounding the total number of breakpoints in h{fi). 

Theorem 5. For every breakpoint of the function h{j3), the corresponding rate vector R that minimizes 
(1421 ) is a vertex of the rate region IZ. 

Proof: Due to the equivalence between the problems LPi(a) and LP3(a) it follows that for every a, 
the rate tuple R which corresponds to the minimizer of the function /i(/3), is a vertex of the rate region 
TZ. For a given cost vector a, we prove this theorem by modifying a such that each breakpoint in h{fi) 
can become the minimizer of the function h that corresponds to the modified vector a. 

To that end, let us consider an example of shown in Figure |7] Each linear segment of /i(/3) is 
described by a pair of vector (b(*),c(*)), i = 1,2,3,4, as in (l44l) . Function is minimized when 

First, we show how to modify a so that the breakpoint /32 becomes the minimizer of LP3(a). From 
(l44l ). we have that the slopes of the segments [/3i,/32] and [/32j/33] are such that Xll^i '^i^l^'' < 0, 
Y^lLi '^ibi < 0. Since h{f3) is convex, it also holds that 

m m 

Y^a^<Y^a^. (47) 

i=l i=l 

Observe that for every (3 > Rco{^), the rate tuple that corresponds to h{(3) is such that R{M-) = 
Hence, for each linear segment it holds that XljLi ^j*^ = 1> ^ = 1, 2, 3, 4. Let 

a- = + Aa, i = 1,2, . . . ,m, (48) 
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Fig. 7. Function h(j3) is a piecewise linear in /?. For the purpose of proving Theorem |5] we consider 4 linear segments, and 
show that each breakpoint can become the minimizer of a different optimization problem. 



where Aa > is a constant. For the weight vector a' constructed in (l48l) . the segments [/3i,/32] and 
[/32,(33] have slopes 

m m 

b^\ai + Aa) = Y^ b?ai + Aa, j = 1, 2. 

i=l 1=1 

Therefore, we can pick Aa such that the linear segment [/3i,/32] has negative slope, while the Unear 
segment [/32,/33] has positive slope. One possible choice is 

m 

Aa = -Y, a*^f ^ + (49) 
1=1 

where e is a small positive constant. Note that due to (|47] ) the linear segment [/3i,/32] still has negative 
slope. Similarly, we can move a minimizer of LP3(a) from /Ja to /34, by modifying a as follows 

a'^ = Oi — Aa, i = 1, 2, . . . , m, (50) 

where Aa = YlZi o^A^^ + ^- I" this case, linear segments [/33,/34] and [/34,/35] have the slopes 
Y.T=i'^'A^^ < and > 0' which makes /3 = /34 the minimizer of LP3(a'). Therefore, 

we showed how to modify the cost vector a, so that the minimizer of h{/3) "jumps" to the consecutive 
breakpoints of h{f3). Repeating this procedure multiple times, one can modify a so that any breakpoint 
becomes the minimizer of ■ 
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VI. Data Exchange Problem with Linear Correlations 

In this section we propose a polynomial time algorithm for achieving a rate tuple that belongs to the 
region Tl{a) in the data exchange problem. In Section |ll] we defined a linear model where each user 
i G M observes a collection of the linear equations in ¥g,^, 

Xi = AjW, i G M, (51) 

where Aj € F^*^^ is a fixed matrix and W G F^, is a vector of data packets. Since all the algebraic 
operations are performed over the base field Fg, the linear model (ISTl ) is equivalent to the scenario where 
each user observes n memoryless instances of the finite linear process dST] ) where W is a uniform vector 
over F^. Hereafter, we will use the entropy of the observations and the rank of the observation matrix 
interchangeably. 

Theorem 6. For the linear source model, any rate tuple R that belongs to the rate region TZde, defined 
in Q, can be achieved via linear network coding, i.e., in order to achieve omniscience it is sufficient for 
each user i ^ M. to transmit Ri properly chosen linear equations of the data packets he observes. 

Proof of Theorem [6] is provided in Appendix [G] This result suggests that in an optimal communication 
scheme, each user transmits some integer number of symbols in ¥q. Hence, a rate tuple that belongs to 
TZ{a) in the data exchange problem has to be some fractional number with the denominator n. To that 
end, we introduce a fractional rate constraint to the optimization problem LPi (a) in order to obtain the 
optimal solution for the data exchange problem. 

m 

minVaii?i, s.t. R{S) > H{Xs\Xs''), C M, (52) 
R. — ^ 

i=l 

where n ■ Ri G Z, \/i £ A4. Optimization problem (l52l ) is an integer linear program, henceforth denoted 
by ILP„(a). We use lZn{a) to denote the rate region of all minimizers of the above ILP, and Rco.nio) 
to denote the minimal cost. 

Notice that there is a certain gap between the "information-theoretic" optimal solution to the problem 
LPi(a), and the "data exchange" optimal solution to the problem ILP„(a). The reason is that the former 
solution assumes that the observation length tends to infinity, while in the data exchange setting we are 
dealing with the finite block lengths. 

In this section we show how to efficiently solve ILP„(a) by applying the optimization techniques we 
derived so far. Then, we propose a polynomial time code construction based on the matrix completion 
method over finite fields borrowed from the network coding literature ifTSl . 
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To gain more insight into the coding scheme, let us start with the problem of finding a rate tuple that 
belongs to the region TZn{l)- 



A. Achieving a rate tuple from 7?.n(l) 

Let us consider the optimization problem ILP„(1). Observe that by applying the modified Edmond's 
algorithm for any /3 > Rco{^)^ we obtain a feasible rate tuple that corresponds to the rate region IZde 
defined in Moreover, by setting /3 to be a fractional number with the denominator n in the problem 
LPi(/3), we also get all the optimal rates to be fractional numbers with the denominator n. Hence, an 
optimal rate tuple with respect to the optimization problem ILP„(1) can be obtained by applying the 
modified Edmond's algorithm for /3 = _ ^^^^(i) xhe next natural question is how far we 

are from the information-theoretic optimal solution, i.e., when n — )• oo. 



Claim 1. The optimal sum rate w.rt. ILPnil) is at most ^ symbols in ¥q away from Rco{^)- 

i?CO,n(l) - Rco{l) < -• 

n 



(53) 



Example 5. Consider an example where 3 users observe the packets of length n = 2 over the field ¥q. 



(54) 



Xi = 


a 


b 


X2 = 


[ a 


c 


X3 = 


b 


c 



where W 



a 
b 
c 



is a data packet vector in F^2 such that a 



ai 02 



, b 



Cl C2 

As pointed out above, we can this of this model as n = 2 repetitions of the finite linear process. 
Solving the problem ILP„(1) for this example, we obtain Ri = R2 = R3 = ^. Moreover, we also obtain 
the same rate allocation for the LPi(l), which suggests that in this case there is no gap in optimality 
between the finite and infinite observation length. 

In Theorem [6] we showed that the network coding solution can achieve any rate tuple that belongs 
to Tide, and hence, it also achieves any rate tuple from 7?-„(l). It is not hard to see that one possible 
solution for this example is: user 1 transmits ai + 62^ user 2 transmits ci +02, and user 3 transmits 

61 + C2. 
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B. Code Construction 

The next question that arises from this analysis is how to design the actual transmissions of each user? 
Starting from an optimal (integer) rate allocation, we construct the corresponding multicast network (see 
Figure [8ll. Then, using polynomial time algorithms for the multicast code construction |fT9l . ifTSl . we can 
solve for the actual transmissions of each user. We illustrate conversion of the data exchange problem to 
the multicast problem by considering the source model in Example [5] Then, the extension to an arbitrary 
linear source model is straightforward. 

In this construction, notice that there are 4 different types of nodes. Conversion of our problem into 
the multicast problem assumes the existence of the super-node, here denoted by S, that possess all the 
packets. In the original problem, each user in the system plays the role of a transmitter and a receiver. 
To distinguish between these two states, we denote si, S2 and S3 to be the "sending" nodes, and ri, r2 
and rs to be the "receiving" nodes which corresponds to the users 1, 2 and 3 in the original system, 
respectively. 

Node S, therefore, feeds its information to the nodes si, S2 and S3. Unlike the multicast problem, 
where any linear combination of the packets can be transmitted from node S to si, S2 and S3, here 
the transmitted packets correspond to the observations of the users 1, 2 and 3, respectively. The second 
layer of the network is designed based on the optimal rates Ri, R2 and R3. Since n = 2, each user 
gets to transmit 1 symbol in ¥q. It is clear that all the receiving users are getting two different types of 
information: 

1) The side information that each user already has. In the multicast network this information is 
transmitted directly from node s, to node r^, i = 1,2,3. 

2) The information that each node receives from the other nodes sj, j ^ i. 
To model the second type of information, let us consider the nodes r2 and r^. 

Due to the broadcast nature of the channel, both r2 and are receiving the same symbol in ¥q from 
node si. Thus, it is necessary to introduce a dummy node ti to model this constraint. The capacities of 
the links si — ti, ti — r2 and ti — r3 are all equal to 1 symbol in F^. Note that this constraint ensures 
that the nodes r2 and are obtaining the same 1 symbol from si. The remaining edges are designed in 
a similar way. 

Now, when we have a well-defined network, it is only left to figure out transmissions on all the edges. 
If we want to apply Jaggi's algorithm |19|, the first step is to determine disjoint paths from the super- 
node S to each receiver ri — using the Ford-Fulkerson algorithm |[20l . While the solution to this 
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Fig. 8. Multicast network constructed from the source model and the optimal rate tuple R\ — R2 — R3 — ^ that belongs to 
7?.2(1). Each user receives side information from "itself" (through the links Si — rt, i — 1,2,3) and the other users (through 
the links ti — rj, i,j £ {1, 2, 3}, i / j). 

problem is easy in the case when each user observes only a subset of the packets (like in this example), 
it is not trivial to find disjoint paths which connect linearly independent sources to the receivers (see 
Figure [8]l. For that reason we apply Harvey's algorithm ifTSl which is based on matrix representation 
of the transmissions in the network |[2TI . |[22l . and simultaneous matrix completion problem over finite 
fields. 

In lIlTI . the authors derived the transfer matrix M(rj) from the super-node S to any receiver r^, 
i = 1, 2, . . . , m. It is a X matrix with the input vector W, and the output vector corresponding to 
the observations at the receiver r^. 

M{ri) = A{I-Ty^B{ri), i = 1,2, . . . ,m, (55) 

where matrix A is a source matrix, T is adjacency matrix of the multicast network, and B(ri) is an 
output matrix. For more details on how these matrices are constructed, we refer the interested reader to 
the reference |21]. Here, we just make a comment on the source matrix A. In general, it is a x ^ 
matrix, where £ is the total number of edges in the network. Input to the matrix A is the vector of 
independent packets W. For the source model in Figure [8l non-zero entries in the matrix A correspond 
to the edges S — si, S — S2 and S — S3. Since, transmissions on those edges are already assigned by the 
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underlying source model, in general we have 



A 



(56) 



1,2,..., m, (57) 



Aj Al ■■■ Al ••• 

where Aj corresponds to the observation matrix defined in dST]) . 

Essentially, a multicast problem has a network coding solution if and only if each matrix M(rj) is 
non-singular. In ifTSl . the author showed that for the expanded transfer matrix defined as 

A 

I-r B(ri) 
it holds that det(M(ri)) = ±det(E(ri)). 

It should be noted that some of the entries in matrices V and B(rj), z = 1, 2, . . . , m, are unknowns. 
To obtain the actual transmissions on all the edges it is necessary to replace those unknown entries 
with elements over ¥q such that all matrices E(rj), i = 1,2, ... ,m, have full rank. This is known as a 
simultaneous matrix completion problem and it is solved in [IS] in polynomial time. 



E(ri) 



Lemma 9 (Harvey, II18D . Polynomial time solution for the simultaneous matrix completion problem exists 
if and only if \^q\ > m. The complexity of the proposed algorithm applied to the data exchange problem 
is 0{m^ ■ ■ ■ log(m ■ N ■ n)). 

The complexity of the code construction can be further reduced when for the (i?i,i?2, • • • ,Rm) G 
7^n(l) it holds that the greatest common divisor gcd{nRi,nR2, ■ ■ ■ ,nRm) > 1- In this case, for every 

h = — -jT^ — ^5 generations of the finite linear process, we still have that each user transmits some 

integer number of symbols in F^. Hence, it is enough to construct a coding scheme for n observations 
of the linear process, and then just to apply such scheme ^ times to solve the data exchange problem. 
From Lemma |9] the complexity of such scheme is 0(m^ • A^'^ • ■ log(m • N ■ h)). 

C. Asymptotic optimality of Rco,n{^) 

In this section we consider under which conditions there is no gap between the solution of the problem 
ILP„(1), when n is finite, and the solution of LPi(l) (asymptotic solution n — )• oo). To that end, let us 
consider the following Lemma. 

Lemma 10. Optimal Rco{^) ^<^t^ of the problem LPi{\) can be expressed as 
Rcoi^) = H{Xm) - min | ^'^^^ ^l^*^^ i ^^^^^ | , V is a partition of M s.t. \V\ > 2. (58) 
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Proof of Lemma [TOl is provided in Appendix IF] It is based on a geometry of the function g{M.,P). 
Minimization (1581 ) was also shown in ifTTI by considering an LP dual of the optimization problem LPi(l). 

From Lemma [TOl Rco{^) can be expressed as a rational number. Moreover, the denominator of 
Rco{^) can be some integer number between 1 and m — 1 depending on the cardinality of the optimal 
partition according to (|58] ). From Lemma |3] it immediately follows that all (i?i,i?2, . . . ,Rm) S 
are also rational numbers with the denominator n. 

To that end, if n is divisible by |P(i?co(l)) - 1| for P(i?co(l))| > 2, then 

Rco,ni^) = Rco{l). (59) 

D. Achieving a rate tuple from TZn (a) 

In Section IVI-BI we argued that once we obtain the optimal fractional rates (which denote how 
many symbols in each user transmits), the construction of the corresponding multicast network is 
straightforward, and hence, the coding scheme can be obtained in polynomial time by using the algorithm 
proposed in lITSl . Here, we describe an algorithm that finds an optimal solution to the optimization problem 
ILP„(a). 

In Section |V] we proposed the gradient descent algorithm to achieve an approximate solution to the 
problem LPi(a). Setting the precision parameter e = ^ it is guaranteed that the distance between the 
sum rate which corresponds to the rate tuple from Tl{a) and the sum rate obtained through the gradient 
descent algorithm, is at most ^, i.e., \l3gd — /3*| < ^. Therefore, we have 

\nPgd-n(3^\<l. (60) 

From (l60l) we conclude that 



n 



1 

< — or 

n 



n 



< -• (61) 

n 



From (|6T]) it follows that we can achieve a rate tuple from Tln{a) which sum rate is at most ^ away from 
(3* by choosing f3 = ^"^'"'-^ or /3 = I^^^^. Let us denote by /3(„) the optimal sum rate w.r.t. ILP„(a). 
To decide which one of the proposed /3's is equal to /?(„), we just need to compare the values of the 
function h at these points. 

A„,=argmm{/.(M£i),ft(l!f^l)}. (62) 



Then, it follows that 



\RcoA^)-Rcoia)\<'^^^^. (63) 
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Complexity of the proposed algorithm is ©(m^ • SFM{m) + log(n • N) ■ m ■ SFM{m)). After obtaining 
an optimal communication rates w.r.t. ILP„(a), it is only left to apply the code construction algorithm 
proposed in Subsection IVI-AI (see Lemma |9ll. 

E. Asymptotic optimality of Rco,n{(^) 

In this section we explore under which conditions the optimal solutions of the problems ILP„(a) and 
LPi(a) are the same. 

In order to obtain the asymptotically optimal rates w.r.t. LPi(a), it is necessary to bound from bellow 
the length of each linear segment in /i(/3). Then, by choosing the appropriate step size in the gradient 
descent algorithm, we can achieve the goal. 

Theorem 7. An optimal asymptotic solution to the problem LPi (a) in the finite linear source model 
can be obtained in polynomial time by using a gradient descent method with the precision parameter 
e = m""*/^. Complexity of the proposed algorithm is 0{{m ■ logm + log A^) • m ■ SFM{m)). 

Proof: In Theorem [5] we showed that each breakpoint in /i(/3) corresponds to a vertex of the rate 
region TZ defined in Q. In other words, for some breakpoint Pj, the optimal rate tuple is uniquely defined 
by the following system of equations 

R{Si) = H{XsAXs^), i = l,2,...,m, (64) 

where Si C A4. Moreover, it holds that R{A4) = {3^. System of linear equations ( [64l ) can be expressed 



in a matrix form as follows. 

1 T 



H{Xs,\Xsi) H{Xs,\Xs^) ... H{XsjXs^J 



A R 

where 

1 if jG5i, 



(65) 



(66) 
otherwise. 



In order to obtain the optimal rate tuple which corresponds to the breakpoint /J^, we can simply invert the 
matrix A. Notice that the right hand side of ( [65] ) consists of the conditional entropy (rank) expressions, 
which are, in the case of the linear source model, integers. Therefore, all optimal rates R which correspond 
to the breakpoints of /i(/3) are fractional numbers with the denominator equal to the det(A). This comes 
from the fact that 
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where adj(A) is the adjugate of A. From |[23l it follows that 

|det(A)| < m"/2. (68) 

Therefore, all the breakpoints of h{f3) are at the distance of at least m~™/^ from each other. Hence, by 
setting the precision parameter in a gradient descent algorithm to e = m^*"/^, we can make sure that the 
minimizer of h{/3) is the end point of the linear segment to which approximate solution belongs to. ■ 
In the further text, we explain how to find the minimum of h{f3) by applying a simple binary search 
algorithm on top of the gradient descent algorithm proposed in Theorem |7] Let us consider the scenario in 
Figure |9] Applying the gradient descent algorithm, with the precision parameter e = m~™/^ we can reach 
a point (3gfi that is e close to the minimizer of /i(/3), i.e., \l3gd - < m-™/2_ Applying the modified 
Edmond's algorithm for /3 = Pgd, we obtain parameters (b^^"'^ c^^"')) (see (l44l )) which correspond to the 
linear segment to which fSgd belongs to. In order to obtain /3* we simply need to jump to the consecutive 
linear segment. To that end, let (3i = Pg^ — m~"^/^ belongs to the linear segment (b(^\ c^^)). Then, /3* 
can be obtained by intersecting these two linear segments. 

p. = l^^=l^r ^-=^ . (69) 

v-^m Ago.) v-^m 7,(1) 



K(3) 




^co(l) /3i /5* i3gd P 

Fig. 9. Line intersection procedure applied on top of the gradient descent algorithm to obtain the minimum h{/3*). 



Therefore, if the data packet length n is divisible by the denominator of /3*, then i?co,n(a) = RcoiQ.). 
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F. Indivisible Packets 

Let us now consider the scenario when the data packets cannot be spUt. To obtain an optimal commu- 
nication rates, we can directly apply the results form the Sections I VI- A I and IVI-DI We can think of this 
problem as having one packet over very large base field Fq>.. 

Hence, for the case when a = 1, it holds that 

i?co,i(l) = r^co(l)l symbols in F,™. 

Similarly, we can obtain the sum rate which corresponds to the optimal Rco,i{ql} as follows. 

^(1) = argmin{/i([/3grfj) ,h{\Pgd\)} symbols in ¥q^. 

However, in the actual coding scheme, all the algebraic operations are performed over the original base 
field Fq. 

VII. Conclusion 

In this work we addressed the problem of the data exchange, where each user in the system possess 
some partial knowledge (side information) about the file that is of common interest. The goal is for each 
user to gain access to the entire file while minimizing the (possibly weighted) amount of bits that these 
users need to exchange over a noiseless public channel. For the general case when the side information 
is in form of the i.i.d. realizations of some discrete memoryless process, we provide a polynomial time 
algorithm that finds an optimal rate allocation w.r.t. communication cost. Our solution is based on some 
combinatorial optimization techniques such as optimizations over submodular polyhedrons, Dilworth 
truncation of intersecting submodular functions, Edmond's greedy algorithm, etc. For the case when the 
side information is in form of the linearly coded packets, besides an optimal rate allocation in polynomial 
time, we provide efficient methods for constructing linear network codes that can achieve omniscience 
among the users at the optimal rates with finite block lengths and zero-error. 

Appendix A 
Proof of Lemma [T] 

Base polyhedron B{f,<) is defined by the following system of inequalities 

Z{S) < f{S), SCM, (70) 
Z{M) = f{M). (71) 
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This is equivalent to the following 

Z{S')>r{S'){=f{M)-f{S)), (72) 
Z{M) = nM){=f{M)), (73) 
where the last equality holds because /(0) = 0. For the second part, we have 

irns) = riM) - ns^) 

= f{M)-{f{M)-f{S)) = f{S). 



Appendix B 
Proof of Lemma H] 

Using the properties of conditional entropy, we can write f{S,fi) = fi — H{Xj^) + H{Xs). When 
5 n T 7^ 0, then the following inequality holds due to submodularity of entropy 

f{S, (3) + f{T, (3) = H{Xs) + H{Xt) - 2{H{Xm) - /3) 

> HiXsyjr) + H{XsnT) - 2{H{Xm) - = f{S U r,/3) + /(5 nr,/3). (74) 

Inequality (1741 ) holds whenever 5nT / 0. To show that the function / is submodular when /3 > H{Xm) 
it is only left to consider the case 5 n T = 0. Since /(0, /3) = 0, we have 

f{S, /3) + /(T, /3) = HiXs) + H{Xr) - 2{H{Xm) - P) 

> H{Xs, Xr) - {H{Xm) -P) = f{S U r, (75) 
Inequality in (1751 ) follows from the fact that 

H{Xs) + H{Xr) - H{Xs,t) = I{Xs; Xr) > > /3 - H{Xm). (76) 
This completes the proof. 

Appendix C 
Proof of Lemma [5] 

Let us define function g{A4, (3, i), i = 1,2, . . . ,m as follows 

g{M,P,i) = mini ^ P - H{Xsc\Xs), s.t. |7^| = i : P is a partition of X > . (77) 
^ [sev J 
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Function g{M.,l5, i) is linear in /3 for any fixed i = 1, 2, . . . , m. Then, the Dilworth truncation g{A4, (3) 
can be written as 



g{M,^)= min g{M,f3,i). 

i=l,2,...,m 



(78) 



Note that the minimization (TTTll does not depend on (3 since it can be written as 

g{M,^,i)=ii 

Therefore, g{Ai,P) can be solved for any given (3 by minimizing over all m lines g{M,f3,i), i 



i(/3-F(X;vi)) + minJ s.t. |P| = z : P is a partition of I . (79) 



g{M,f3) 



/I 




H{X, 



M) 



Fig. 10. Function g{A4,l3) is piecewise linear in /3. It can be obtained by minimization over m linear functions g{M,j3,i) 
i = 1, 2, . . . , m. g{M,l3) has non-increasing slope, /.e., 1 < j < k < ■ ■ ■ < I < m. 



1,2, ... ,m. Hence, g{A4,P) has at most m linear segments. Moreover, due to minimization (1781 ). g{M, 13) 
has non-increasing slope (see Figure [TOb . 

To verify that the last linear segment in g{A4, (3) is of slope 1, it is sufficient to find a point f3 for which 
the function g{M.,(3) has slope 1. To that end, let us consider /3 = H{Xm)- From Lemma |4] it follows that 
f{S, P = H{Xm)) is submodular function, and hence, g{M,(3 = H{Xm)) = f{M, (3 = H{Xm)) = /3, 
where the last equality follows from (|29l ). Therefore, the slope of g{Ai,(3) at /? = H{Xj^) is 1, which 
completes the proof. 
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Appendix D 

Optimal Partitioning w.r.t. Dilworth Truncation 

In lfT6l it was shown how to obtain an optimal partition 1^(13) of the set A4 w.r.t. (l34l) from the 
modified Edmond's algorithm. Here we provide intuition behind these results. From Remark [3] it follows 
that g{Ai, P) is the optimal value of the optimization problem LP2(/3). As we pointed out in Section Hill 
in each iteration i of the modified Edmond's algorithm, we obtain a set Si for which the inequality 
constraint in P{f, (3) holds with equality. In the next claim we state a result that is crucial for obtaining 
an optimal partition of M. with respect to Dilworth truncation of f{M.,j3). 

Claim 2. For an optimal solution Z of the problem LP2{P), if Z{Si) = f{Si), and Z{Si) = f{S2) then 
Z(5iU52) = /(5iU52). 

Proof: For an optimal rate vector Z of the problem LP2(/3) we have 

Z{Si) = p- H{Xs^ \Xs^ ) = /3 - H{Xm ) + H{Xs, ) , (80) 
Z{Sj) = /3 - H{Xsr: \Xs,) = /3 - H{Xm) + H{Xs,). (81) 
Since LP2(/3) represents optimization over the polyhedron P{f,/3) it holds that 

Z{Si U Sj) <p- H{Xm) + H{Xs^,Xs,), (82) 
Z{Si n Sj) < /3 - H{Xm) + H{Xs,ns,)- (83) 
From dHOjl and (IB it follows that 

U Sj) = Z{S{) + Z{Sj) - Z(S, n 5,-) 

= /3 - H(Xm) + i^(X5j + /3 - ^(^m) + ^(^5,) - Z(Si n Sj) 

> /? - R{Xm) + ii{Xs:) + ii{Xs;) - H{Xs^ns,), (84) 

where the last step in (l84l l follows from (l83l l. Due to submodluarity of entropy it directly follows from 
dH that 

Z{Si U Sj) > /3 - H{Xm) + H{Xs, ,Xs,). (85) 
Comparing (l82l ) and (1851 ) it must hold that 

Z{Si U 5j) = /3 - + H{Xs^ ,Xs^). (86) 
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Results of Claim |2] represent a key building block for obtaining an optimal partition ■P(/3) for some 
fixed 13 (see Algorithm |3]l. From Remark [3] it follows that for the maximizer rate vector Z of the problem 
LP2W) it holds that 



From Claim |2] and (l87l ) it follows that for the sets Si and Sj, which are the minimizer sets in iterations 
i and j of the modified Edmond's algorithm, if Si fl Sj / 0, then Si U Sj is a subset of the one of the 
partition sets in V{P). Therefore, in each iteration of the modified Edmond's algorithm, whenever the 
minimizer set intersects some of the previously obtained sets, they must all belong to the same partition 
set (see steps 4 and 5 in Algorithm |3]l. Algorithm |3] compared to the modified Edmond's Algorithm, 

Algorithm 3 Optimal Partition f\6] 
1: Let j(l), j(2), . . . ,j{m) be any ordering of {1,2,. . . ,m}, and A = {j{l),j{2), . . . 

2: InitiaUze = 0. 

3: for i = 1 to m do 

4: Let Si be the minimizer of 



5: ri=S^U [U{V : V G r'-\ SiHV^ 0}] 

6: V' = {%}u{V -.v e V'-\ 5i n V = 0} 

7: end for 

8: p(/3) = V"'. 



has two additional steps in each iteration (step 5 and step 6). Thus, the order of complexity of both 
algorithms is the same and it is 0{m ■ SFM{m)). The complete explanation of the Algorithm [3] can be 
found in Ull. 



Z(5) = /(5,/3), V5€P(/3). 



(87) 



= min{/(5, /3) - Z{S) : jii) eS, SC Ai}. 



Appendix E 



Proof of Lemma [6] 



Function /i(/3) is given by 



m 




(88) 



i=l 
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Continuity of h{(3) 

Let the rate tuple {R^^\ r'^\ . . . ,Rm) corresponds to the minimizer /3i of the function h{(3), i.e., 
Y^^i ^i^^ = Pi- Then, for a point /32 = /3i + A/3 let us construct the rate tuple 

+ if/3 = l, 
Rr = { (89) 
[r^'> if/3/1. 

Then is a feasible rate tuple for the optimization problem (l88] ) when ^1 = 

/32. Moreover, it holds that h{(32) - /i(/3i) < aiA/3. Hence, 

1/32 - /3i| < A/3 ^ |/i(/32) - /i(/3i)| < QiA/3, (90) 

Since ai < oo by the model assumption, it immediately follows that the function /i(/3) is continuous. 

Convexity of h{f3) 

Consider two points /3i and /32 such that /3j > Rcoi^)^ i = li2. We want to show that for any 
A € [0, 1] it holds that /i(A/3i + (1 - A)/32) < A/i(/3i) + (1 - A)/i(/32). To that end, let R(i) and R(2) be 
the optimal rate tuples w.r.t. /i(/3i) and h{/32), respectively. Now, we show that R = AR^^) + (1 - A)R(2) 
is feasible rate tuple for the problem ([88l ) when /3 = A/3i + (1 — A)/32. 

Since i2(i)(A4) = /3i and R^^\M) = ^2, it follows that 

= \R^^\M) + (1 - A)i?(2)(x) = A/3i + (1 - A)/32. (91) 

Since R^^\S) > H{Xs\Xs^), R^^\S) > H{Xs\Xsc), V5 C M, we have 

R{S) = XR^^\S) + (1 - A)i?(2)(5) > //(x^lX-je). (92) 

From (|9l1 ) and (|92l ) it follows that R is a feasible rate tuple w.r.t. optimization problem (1881 ). Therefore, 
ET=i > + (1 - A)/32). Hence, 

/i(A/3i + (1 - A)/32) < A/i(/3i) + (1 - A)/i(/32), (93) 

which completes the proof. 
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Appendix F 
Proof of Lemma [TO] 

For /3 = -Rco(l) it holds that \V{f3)\ = 1. Since /3 = i?co(l) is also a breakpoint in g{M,l3) (see 
Lemma |5l), we have that |'P(/3)| > 2. In other words, optimal partition of the set A4 is not unique. From 
(l34l) and (l35l) we can write expression for Rco{^) as follows 



i?co(l) = |^(^co(l))|i?co(l)- J] H{XsAXs). (94) 

5eP(Rco(l)) 

Rearranging terms in ( [94l ) we get 

(|P(i?co(l))| -l)i?co(l) = J] i/(X5)-|P(i?co(l))|^(X.M). (95) 
Dividing both sides of equality by {\V{Rco{^))\ — 1) we obtain 

This completes the proof of dSS]) since |7'(i?co(l))| > 2. 

Appendix G 
Proof of Theorem [6] 

We prove this theorem by showing that for any rate tuple R that belongs to the rate region TZde, 
defined in there exists a network coding solution to the data exchange problem. 

In the data exchange problem, each of the m users get to observe some collection of linear combinations 
of the data packets wi,W2, ■ ■ ■ , wn- 

X, = A, -W, \/ieM, (97) 

1 T 



where A^ e F^^^^, and W 



Wi W2 ■ ■ ■ Wn 

Since each user is interested in recovering all the data packets W, one can convert the data exchange 
problem into a multicast network problem. For instance, considering the user 1 as a receiver (see 
Figure [TTI ). it obtains the side information from himself (thus the link of capacity ii from user 1 to user 
1), and it receives transmissions from the other users through the links of capacities Ri, i = 2, 3, . . . , m. 
But, in order to set up the problem this way it is necessary to know how many symbols in ¥g^ each user 
broadcasts, i.e., we need to know the capacities Ri of the links. 

In II22I . the authors proved necessary and sufficient conditions for the existence of the network coding 
solution when the sources are linearly correlated. In the following Lemma we state their result adapted 
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Fig. 11. Data exchange problem can be interpreted as a multicast problem. Considering user 1 as a receiver, it obtains the side 
information from himself through the link of capacity £i, and it receives transmissions from the other users through the links 
of capacities Ri, i = 2,3, . . . ,m. 



to the data exchange problem with Unearly coded packets. Let us denote by Aj{Si,-k) a sub-matrix of 
Aj with rows indexed by the elements of the set Si. 

Lemma 11. In the data exchange problem with linearly coded packets, a rate tuple (i?i,i?2, • • • iRm) 
can be achieved by network coding if and only if 

ra«^(Ai, A2(cS^'\ *),..., A„(5«, ^)) = A^, (98) 
rank{Ai{sf\^),A2, . . .,Am{S^^\*)) = N, (99) 

rank{Ai{s'l"'\ *),..., Am-i{Sl^}„*), Am) = N, (100) 
such that | = Vj G {1, 2, . . . , m}, Vi G {1, 2, . . . , m} \ {j}, where 5^ ^ C {1, 2, . . . , £i}. 

Each equation in (|98])-( |100T ) corresponds to the selection of N disjoint paths from the users 1 through 
m, to one of the receiving users (see Figure [TT] where user 1 is the receiving user). Hence, for a rate 
tuple i?2, • • • , Rm) that satisfies the conditions in Lemma [TT] there exists a network coding solution 
to the data exchange problem. Now, let us consider the equations (|98] ) through (llOOl ). The idea is to 
identify the set of all achievable solutions for each receiver, i.e., the goal is to find the collection of sets 
^ j^j for each j G {1,2,..., m} which satisfy the conditions of the row in (|98])- (I100I ). To 
that end let us consider Algorithm |4] (see dH). 



39 



Algorithm 4 Greedy Algorithm 



1: Initialize G {1, 2, . . . , m}, S = Aj(i). 

2: Let j(2), j(3), . . . ,j{m) be any ordering of {1, 2, 

3: for i = 2 to m do 



.m}\{j{l)}. 



4: 
5: 
6: 

7: 



Initialize S 



for k 



1 to ^j(j) do 



if rank(S,Aj(j) (/;:,★))= rank{S} +rank{Aj(j)(/c,*)} then 



end if 
end for 



S 



S 



10: end for 



It is not hard to conclude that Algorithm |4] satisfies the maximum rank property, i.e., for every j(l) G 
{1,2, . . . ,m} it holds that 

rank(A,(i), A,(2)(5]J(;)\*), . . .,A,^{s'f^\^ 

= rank(Aj(i), Aj(2), . . . , Aj(j)), i = 2,3, ...,m (101) 
Therefore, for one particular ordering j(l), j(2), . . . ,j{m) of 1, 2, . . . , m, we have that 

=rank(Aj(i), Aj(2),- • ■ > A^^i)) - i'ank(Aj(i), Aj(2), . . . , Aj(i_i)), i = 2,3,... ,m. (102) 
From (11021 ) it follows that 

m 

X] = '■ank(Aj(i), Aj(2), . . . , Aj(„)) - rank(Aj(i), Aj(2), . . . , ^j{t-i)) 

i=t 

= rank(Aj(t), Aj(i+i),. . . , Aj(m) | Aj(i), Aj(2), . . . , Aj(t_i)), t = 2,3, . . . ,m. (103) 

Since the feasibility condition has to be satisfied for any ordering, we conclude that if for every ordering 
i(l),j(2),...,i(m) of 1,2,..., m 

m 

^Rj(i) > rank(Aj(t), Aj(t+i),. . . , Aj-(^) | Aj(i), Aj(2), . . . , Aj(t_i)), t = 2,3, . . . ,m, (104) 

i=t 

then i?2, • • • , Rm) can be achieved by network coding. It is not hard to see that the rate region in 
(11041) is equivalent to 

>rank(A5|A50, V5 C {1, 2, . . . , m}. (105) 

ies 
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Thus, we showed that the cut-set bounds (1105b for the data exchange problem with linearly coded packets 
can be achieved via network coding. 
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